When is a crisis really a crisis?

Using NLP and corpus linguistic methods to reveal differences in migration discourse across Czech media

Irene Elmerot
Stockholm University

October 19, 2023
Slovko 2023
Bratislava

Background

  • Exploring Czech media discourse of refugees, asylum seekers, immigrants and migrants (RASIM) in a European migration crisis context, 2015–2023.

  • Addressing a gap in the computational analysis of large datasets of media coverage of migration in the Czech Republic. 

Research design

  • Open, transparent, and replicable study (also for other languages / regions).

  • Investigating differences in Czech mainstream vs. alternative media’s language use: 

    • Possibility to extract (via API) news documents from ≈ 4,000 media outlets
      through Newton Media Archive, including issues in 2023.
    • Corpus based on terms for voluntary vs. forced migration.
      • Presence of actors in RASIM news (2015–2016 vs. 2022–2023).
      • Dominant collocates of RASIM terms (2015–2016 vs. 2022–2023).

Methodological start

  • Use of combined methods from both Corpus Linguistics (CL) and Natural Language Processing (NLP).

  • Construction and corpus analysis: ~1 million documents (January 2015–February 2023) = 800 million tokens.

  • Labelled 156 million tokens in documents as mainstream or alternative media; these data cleaned for CL and NLP.

Graphs

  • Proportion of the “Migration in news corpus” in all the categorised news material from Newton Media Archive.

Analysis 1: Word Frequencies

  • Tracked changes in language use over time and across media types using lemma categorization of voluntary vs. forced migration terms and their monthly frequencies.

Analysis 2: Named Entity Recognition (NER)

  • Used NameTag 2 model for NER to identify key entities in RASIM discourse.

  • This NER is constructed to show geographic locations, personal names, media organisations and other institutions.

  • The graphs show alternative media during 2015 and then 2022–2023, followed by mainstream media during the same periods.

Analysis 3: Collocations

  • Conducted collocation analysis to uncover term associations’ shifts in RASIM discourse.

  • Collocation graphs here only for “migrant”, the voluntary migration term.
  • Again, first the alternative media during 2015 and then 2022–2023, followed by mainstream media.

Key Findings

  • Alternative media’s heightened use of voluntary migration terms and references to “the West” and international actors.

  • Shift in collocations of migration terms between the three periods, with a distinct focus on differences between alternative and mainstream media.

  • Emerging media discourse shift during the crisis following February 2022:

    • Increased mentions of specific geographical names (entities).
    • “Illegal” becomes a stronger collocate to “immigrant” than to “migrant”.
    • The word “quota” disappears as a prominent collocation in alternative media.

Conclusions, limitations and possibilities

  • The 2022 crisis was (is?) portrayed as a close reality, while the 2015 crisis was more elusive.

  • Insights into far-right media communication, anti-immigrant sentiments and responsible migration reporting.

  • Calls for further Czech / international migration discourse research and interdisciplinary approach:

    • Unaddressed verbal aspects (imperfective / perfective).
    • Unaddressed lexical aspects (e.g. metaphors).
    • Other methods (e.g. Cvrček's Companions) may be used.

Acknowledgements

  • Gunvor och Josef Anérs Stiftelse (application number FB22-0088).

Get in touch :)

References

Baker, P., Gabrielatos, C., KhosraviNik, M., Krzyżanowski, M., McEnery, T., & Wodak, R. (2008). A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse & Society, 19(3), 273–306. https://doi.org/10.1177/0957926508088962
Cvrček, V., & Fidler, M. (2022). No Keyword is an Island: In search of covert associations. Corpora, 17(2), 259–290. https://doi.org/10.3366/cor.2022.0256
Cvrček, V., Jeziorský, T., & Henyš, J. (2022). ONLINE2_NOW: Monitoring corpus of online Czech. Ústav Českého národního korpusu FF UK, Praha. http://www.korpus.cz
Douglas, P., Cetron, M., & Spiegel, P. (2019). Definitions matter: Migrants, immigrants, asylum seekers and refugees. Journal of Travel Medicine, 26(2), taz005. https://doi.org/10.1093/jtm/taz005
Štětka, V., Mazák, J., & Vochocová, L. (2021). Nobody Tells us what to Write about”: The Disinformation Media Ecosystem and its Consumers in the Czech Republic. Javnost - The Public, 28(1), 90–109. https://doi.org/10.1080/13183222.2020.1841381
Straka, M. (2018). UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task. Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 197–207. https://doi.org/10.18653/v1/K18-2020
Wijffels, J. (2023). Udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the ’UDPipe’ ’NLPToolkit. https://CRAN.R-project.org/package=udpipe